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(57) Multimedia content containing moving pictures 
and audio is divided into multiple scenes and metadata 
is generated for each of the scenes obtained as a result 
of the division. It is possible to generatemetadata con- 
taining scene structure informationmetadata that de- 
scribes the hierarchical structure of the content in addi- 
tion to scene section information and titles. Also, in order 
to perform re-generation of metadata more appropriate- 
ly, the metadata re-generation is performed using meta- 
data optimization hint information describing each type 
of descriptor contained in the metadata. 
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Description 

TECHNICAL FIELD 

5 [0001 ] The present invention relates to a metadata editing apparatus, a metadata reproduction apparatus, a metadata 
delivery apparatus, a metadata search apparatus, a metadata re-generation condition setting apparatus, a content 
delivery apparatus, and a metadata delivery method, with which, for instance, multimedia content containing moving 
pictures and audio is divided into multiple scenes and metadata is generated for each of the scenes obtained as a 
result of the division. 

10 

BACKGROUND ART 

[0002] A conventional moving picture management apparatus is known which includes: a means for dividing a video 
into multiple scenes and editing and creating at least one index as an assembly of section information necessary for 

15 reproduction of each scene, a scene number assigned to each scene, and a representative image of each scene; 
means for giving a title to each index for the purpose of searching; and means for searching for a desired index using 
a corresponding title and successively reproducing scenes of the index in order of the scene numbers. With this con- 
struction, it becomes possible to reproduce only necessary scenes by editing an index in which the necessary scenes 
are arranged (see Japanese Patent Laid-Open No. 2001-028722 (page 1 , FIG. 1), for instance). 

20 [0003] With the moving picture management apparatus described above, however, metadata is merely created using 
the section information necessary for the scene reproduction, the scene number, and the scene representative image. 
Therefore, there remains a problem that it is impossible to also manage the structure of video data such as the hier- 
archical property of the video data. 

[0004] Also, at the time of searching for a registered image, the title given to a corresponding index is used, causing 
25 a disadvantage in that in order to obtain an appropriate search result, an appropriate title needs to be input. 

[0005] The present invention has been made in order to solve the problems described above. Therefore, it is an 
object of the present invention to provide a metadata editing apparatus capable of generating metadata that is index 
information showing the structure and the like of content (video data, for instance) in addition to scene section infor- 
mation and titles. 

30 [0006] It is another object of the present invention to provide a metadata reproduction apparatus, a metadata delivery 
apparatus, a metadata search apparatus, a metadata re-generation condition setting apparatus, a content delivery 
apparatus, and a metadata delivery method, with which it is possible to collect and reproduce only scenes which a 
user wishes to watch using the metadata generated by the metadata editing apparatus, or to search for the scenes 
desired by the user using characteristic amounts or the like described in the metadata. 

35 

DISCLOSURE OF INVENTION 

[0007] Ametadata editing apparatus according to the present invention is provided with: a scene division unit for 
dividing multimedia content containing at least one of moving pictures and audio into a plurality of scenes to generate 
40 scene section informationmetadata indicating a scene start position and a scene end position for each scene obtained 
as a result of the division; a scene description edit unit for performing hierarchical editing of each scene of the multimedia 
content based on the scene section informationmetadata sent from the scene division unit and generating scene struc- 
ture information metadata describing a hierarchical structure of the multimedia content; and a metadata description 
unit for integrating the scene section information metadata and the scene structure information metadata and gener- 
is ating metadata describing contents and a structure of the multimedia content in accordance with a predetermined 
format. 

[0008] Further a metadata delivery apparatus according to the present invention is provided with: a hint information 
analysis unit for analyzing metadata optimization hint information describing a type and content of each descriptor 
contained in metadata; a metadata analysis/re-generation unit for analyzing metadata describing contents and a struc- 
50 ture of multimedia content containing at least one of moving pictures and audio based on the analyzed metadata 
optimization hint information and a condition for metadata re-generation and re-generating second metadata; and a 
metadata delivery unit for delivering the second metadata re-generated by the metadata analysis/re-generation unit 
to a client terminal. 

[0009] Further a metadata delivery method according to the present invention includes the steps of: analyzing meta- 
55 data optimization hint information describing a type of each descriptor contained in metadata; re-generating second 
metadata by analyzing the metadata describing contents and a structure of multimedia content containing at least one 
of moving pictures and audio based on the analyzed metadata optimization hint information and a condition for re- 
generation of the metadata; and delivering the re-generated second metadata to a client terminal. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0010] 

5 FIG. 1 is a block diagram showing a construction of a metadata editing apparatus according to a first embodiment 

of the present Invention. 

FIG. 2 shows a news video that is an example of a target of editing by the metadata editing apparatus according 
to the first embodiment of the present invention. 

FIG. 3 shows an example of scene section information metadata of a scene division unit of the metadata editing 
10 apparatus according to the first embodiment of the present invention. 

FIG. 4 shows an example of scene structure information metadata of a scene description edit unit of the metadata 
editing apparatus according to the first embodiment of the present invention. 

FIG. 5 shows examples of screen images of a content reproduction/display unit and a user input unit of the metadata 
editing apparatus according to the first embodiment of the present invention. 
15 FIG. 6 is a block diagram showing a construction of a metadata editing apparatus according to a second embod- 

iment of the present invention. 

FIG. 7 illustrates how themetadata editing apparatus according to the second embodiment of the present invention 
operates. 

FIG. 8 is a block diagram showing a construction of a metadata reproduction apparatus according to a third em- 
20 bodiment of the present invention. 

FIG. 9 illustrates how the metadata reproduction apparatus according to the third embodiment of the present 
invention operates. 

FIG. 1 0 is a block diagram showing a construction of a content delivery system according to a fourth embodiment 
of the present invention. 

25 FIG. 11 shows content (in this case, a news video) structure information outputted from a metadata analysis unit 

of a metadata delivery server according to the fourth embodiment of the present invention. 
FIG. 12 shows an example of a structure of content after restructuring by a metadata re-generation unit of the 
content delivery system according to the fourth embodiment of the present invention. 

FIG. 13 is a block diagram showing a construction of a metadata delivery server according to a fifth embodiment 

30 of the present invention. 

FIG. 14 shows an example of video content, with reference to which processing of metadata optimization hint 
information by the metadata delivery server according to the fifth embodiment of the present invention is described. 
FIG. 15 shows howmetadata is described inMPEG-7 by the metadata delivery server according to the fifth em- 
bodiment of the present invention. 

35 FIG. 1 6 shows an example of a format of the metadata optimization hint information used by the metadata delivery 

server according to the fifth embodiment of the present invention. 

FIG. 17 shows the metadata optimization hint information used by the metadata delivery server according to the 
fifth embodiment of the present invention. 

FIG. 1 8 is a flowchart showing how a metadata analysis/re-generation unit of the metadata delivery server accord- 
40 ing to the fifth embodiment of the present invention operates. 

FIG. 1 9 is another flowchart showing how the metadata analysis/re-generation unit of the metadata delivery server 
according to the fifth embodiment of the present invention operates. 

FIG. 20 is a block diagram showing a construction of a metadata search server according to a sixth embodiment 
of the present invention. 

4 5 FIG. 21 is a flowchart showing how a metadata analysis unit of the metadata search server according to the sixth 

embodiment of the present invention operates. 

FIG. 22 is a block diagram showing a construction of a client terminal according to a seventh embodiment of the 
present invention. 

FIG. 23 is a block diagram showing a construction of a content delivery server according to an eighth embodiment 
50 of the present invention. 

BEST MODE FOR CARRYING OUT THE INVENTION 

[0011] Embodiments of the present invention will now be described with reference to the accompanying drawings, 
55 with a first embodiment relating to a metadata editing apparatus, a second embodiment relating to another metadata 
editing apparatus, a third embodiment relating to a metadata reproduction apparatus, a fourth embodiment relating to 
a content delivery system, a fifth embodiment relating to a metadata delivery server, a sixth embodiment relating to a 
metadata search server, a seventh embodiment relating to a client terminal, and an eighth embodiment relating to a 
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content delivery server. 
First Embodiment 

5 [0012] In this first embodiment, a metadata editing apparatus will be described which divides multimedia content 
containing moving pictures and audio into multiple scenes and creates metadata (index information) containing de- 
scriptions of a scene hierarchical structure and characteristic amounts of each scene. 

[0013] The metadata editing apparatus according to the first embodiment of the present invention will be described 
with reference to the accompanying drawings. FIG. 1 is a block diagram showing a construction of the metadata editing 
10 apparatus according to the first embodiment of the present invention. Note that in each drawing, the same reference 
numerals denote the same or equivalent portions. 

[0014] Referring to FIG. 1 , a metadata editing apparatus 1 00 includes a content reproduction/display unit 2, a scene 
division unit 3, a thumbnail image generation unit 4, a scene description edit unit 5, a text information giving unit 6, a 
characteristic extraction unit 7, a user input unit 8, and a metadata description unit 9. 

15 [0015] The content reproduction/display unit 2 reproduces and displays multimedia content 10 that includes video 
data and audio data and is a target of editing. The scene division unit 3 divides the content into multiple scenes. The 
thumbnail image generation unit 4 extracts a representative frame of each scene as a thumbnail image. The scene 
description edit unit 5 hierarchically edits the scenes obtained as a result of the division by the scene division unit 3 
through scene grouping, scene combining, scene deletion, generation of information that shows relations among the 

20 scenes, and the like. The text information giving unit 6 gives various types of text information to each scene. The 
characteristic extraction unit 7 extracts characteristics of each scene. 

[0016] Also, the user input unit 8 receives input of designation information from a user and outputs it to the content 
reproduction/display unit 2, the scene division unit 3, the thumbnail image generation unit 4, the scene description edit 
unit 5, and the text information giving unit 6 as user input information 11 . 

25 [0017] Further, the metadata description unit 9 integrates scene section information metadata 12, scene thumbnail 
image information metadata 13, scene structure information metadata 14, text information metadata 15, and charac- 
teristic description metadata 16 outputted from the scene division unit 3, the thumbnail image generation unit 4, the 
scene description edit unit 5, the text information giving unit 6, and the characteristic extraction unit 7, respectively. 
The metadata description unit 9 then generates metadata 17 describing the contents and structure of the multimedia 

30 content in accordance with a specified format. 

[001 8] Next, how the metadata editing apparatus according to the first embodiment operates will be described with 
reference to the accompanying drawings. FIG. 2 shows a construction of a news video that is an example of a target 
of editing by the metadata editing apparatus according to the first embodiment. 

[001 9] A case where the news video having the construction shown in FIG . 2 is edited will be described as an example. 
35 [0020] First, the content reproduction/display unit 2 of the metadata editing apparatus 100 receives input of the 
multimedia content 10, such as video content, stored in a content storage unit (not shown) via a network or the like, 
and reproduces/displays the multimedia content 10 for editing. 

[0021] When the user of the metadata editing apparatus 100 inputs positions for clipping a scene, which is to say a 
scene start position and a scene end position, using the user input unit 8 while watching the reproduced video, the 
40 scene division unit 3 generates the scene section information metadata 12 showing the scene start position and the 
scene end position inputted from the user. 

[0022] FIG. 3 shows an example of the scene section information metadata generated by the scene division unit of 
the metadata editing apparatus according to the first embodiment. 

[0023] Here, the scene section information metadata 1 2 shown in FIG. 3 was generated from the news video shown 
45 in FIG. 2. As shown in FIG. 3, the scene section information metadata 12 generated by the scene division unit 3 gives 
the scene start position and the scene end position of each scene clipped from the news video content, such as a 
"news digest" scene, a "domestic news" scene, and an "international news" scene. 

[0024] On receiving designation of scene editing from the user via the user input unit 8, the scene description edit 
unit 5 performs hierarchical editing of the scenes continuously clipped by the scene division unit 3 based on the scene 

so section information metadata 1 2 from the scene division unit 3, and then outputs the scene structure information meta- 
data 14. Here, the scene hierarchical editing refers to scene grouping, scene re-division, scene combining, or scene 
deletion, for instance. The scene grouping refers to grouping of scenes that are related to each other with respect to 
specific characteristics into a single group. For instance, as shown in FIG. 4, the "domestic news" scene, the "interna- 
tional news" scene, and the "financial news" scene of the news video shown in FIG. 2 are grouped into a single "news" 

55 group. Also, the scene re-division refers to division of a single scene into multiple scenes and the scene combining 
refers to generation of a single scene by combining multiple scenes with each other. 

[0025] FIG. 4 shows an example of the scene structure information metadata generated by the scene description 
edit unit of the metadata editing apparatus according to the first embodiment. 
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[0026] The scene structure information metadata 1 4 shown in FIG. 4 describes the hierarchical structure of the video 
content generated as a result of the editing by the scene description edit unit 5. In FIG. 4, a "news" scene is edited 
into multiple scenes, such as a "news digest" scene, a "news" scene, a "special" scene, and a "sports" scene, and the 
"news" scene is further hierarchically edited into a "domestic news" scene, an "international news" scene, and an 
5 "financial news" scene by the scene description edit unit 5 through scene editing such as the scene grouping, scene 
re-division, and scene combining. 

[0027] Then, the metadata 14 generated by the scene description edit unit 5, such as the metadata shown in FIG. 
4, is outputted to the metadata description unit 9. 

[0028] On the other hand, the thumbnail image generation unit 4 generates a representative frame of each scene 

io clipped by the scene division unit 3 as a thumbnail image based on the scene section information metadata 12 from 
the scene division unit 3, and outputs information concerning the generated thumbnail image as the thumbnail image 
information metadata 13 to the metadata description unit 9, in which the thumbnail image information metadata 13 is 
registered. Here, it is possible for the user to perform selection of the thumbnail using the user input unit 8, although 
it is also possible to automatically set a head frame or each frame clipped at fixed time intervals as the representative 

'5 frame or to automatically detect each scene change point and set a frame at each detected point as the representative 
frame. The thumbnail image information metadata 13 is information showing the position (such as the frame number 
or time) of the thumbnail in the video content or information giving the location (such as the URL) of the thumbnail image. 
[0029] Also, the characteristic extraction unit 7 extracts visual characteristic amounts possessed by each scene, 
such as motions, colors, or shapes of objects contained in the scene, from the scene based on the scene section 

20 information metadata 1 2 from the scene division unit3. The extracted characteristic amounts are outputted to the meta- 
data description unit 9 as the characteristic description metadata 16 and are registered therein. 
[0030] Also, the text information giving unit 6 gives various types of text information, such as a title, an abstract, a 
keyword, a comment, and scene importance, designated by the user to each scene based on the scene section infor- 
mation metadata 1 2 from the scene division unit 3. Here, the text information may be given through user's input using 

25 the user input unit 8 or be automatically given through analysis of audio information and captions contained in the 
content. The text information is outputted to the metadata description unit 9 and is registered therein as the text infor- 
mation metadata 15. 

[0031] FIG. 5 shows examples of screen images displayed by the content reproduction/display unit and the user 
input unit 8 of the metadata editing apparatus according to the first embodiment. In FIG. 5, a video reproduction screen 

30 G1 is an example of the screen image displayed by the content reproduction/display unit 2, with content to be edited 
being reproduced/displayed on this video reproduction screen G1. Although not clearly shown in FIG. 5, like in the 
case of an ordinary video reproduction apparatus, a user interface is also provided which includes buttons and the like 
for commanding "reproduction", "stop", "rewind", "fast forward", "frame advance", and other operations. Also, below 
the video reproduction screen G1 , a scene division designation screen G2 is displayed which has a slider form, for 

35 instance. The user designates a scene start position and a scene end position of the video displayed on the video 
reproduction screen G1 through this scene division designation screen G2 while watching the video displayed on the 
video reproduction screen G1 . Also, the user simultaneously designates the position of a thumbnail between the scene 
start position and the scene end position through the scene division designation screen G2. Here, when the thumbnail 
position is designated through the scene division designation screen G2, the thumbnail image generation unit 4 gen- 

40 erates a thumbnail image from a frame of the video content at the designated position. 

[0032] Also, the thumbnail image, whose position has been designated through the scene division designation screen 
G2, is displayed on a scene division information display screen G3 as scene division information. Here, on this scene 
division information display screen G3, it is also possible to display information showing the scene start position and 
the scene end position in addition to the thumbnail image, as shown in FIG. 3. 

45 [0033] Next, theuser designates scene editing through a tree structure generation designation/display screen G4. 
That is, the user generates a tree showing the hierarchical structure possessed by the video content while watching 
the scene division information , such as the thumbnail image, displayed on the scene division information display screen 
G3. 

[0034] When performing the scene grouping, the user uses a manipulation method with which, for instance, a new 
50 node is added to the tree and each scene that should be grouped is added to the node. In order to perform the scene 
addition, the user may use a method with which a scene that should be added is selected on the scene division infor- 
mation display screen G3 and the selected scene is added to the node by a drag-and-drop operation. Here, it is possible 
for the user to input text information for the selected scene from the scene division information display screen G3 or 
the tree structure generation designation/display screen G4 using the user input unit 8 provided as a user interface for 
55 giving the text information to the scene via the text information giving unit 6. 

[0035] The metadata description unit 9 generates a metadata file described in accordance with a specified description 
format by integrating the various types of metadata outputted from the scene division unit 3, the thumbnail image 
generation unit 4, the scene description edit unit 5, the text information giving unit 6, and the characteristic extraction 
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unit 7. The specified metadata description format may be a uniquely determined format, although M PEG-7 standardized 
by ISO is used in this first embodiment. The MPEG-7 stipulates a format for describing the structure and characteristics 
of content and includes an XML file format and a binary format. 

[0036] As described above, the metadata editing apparatus 100 of the first embodiment is provided with the scene 
5 description edit unit 5 for hierarchically editing scenes and the characteristic extraction unit 7 for extracting character- 
istics from the scenes, so that it becomes possible to generate metadata describing the hierarchical structure possessed 
by content, such as video data, and characteristic amounts of each scene. 

[0037] It should be noted here that the multimedia content 10 inputted into the content reproduction/display unit 2 is 
obtained from a content server (not shown) existing on a network, from a content storage unit (not shown) in the 

10 metadata editing apparatus 100, or from an accumulation medium (not shown) such as a CD or a DVD, for instance. 
In a like manner, the metadata outputted from the metadata description unit 9 is accumulated in a metadata server 
(not shown) existing on a network, in a metadata accumulation unit (not shown) in the metadata editing apparatus, or 
in an accumulation medium (not shown), such as a CD or a DVD, together with content, for instance. 
[0038] Also, in the first embodiment, a case where the metadata editing apparatus 1 00 is provided with both of the 

15 scene description edit unit 5 and the characteristic extraction unit 7 has been described. However, the present invention 
is not limited to this and it is of course possible to provide the metadata editing apparatus 100 with only one of the 
scene description edit unit 5 and the characteristic extraction unit 7. 

Second Embodiment 

20 

[0039] In the first embodiment described above, every scene is divided manually. However, a metadata editing ap- 
paratus to be described in this second embodiment is provided with a scene change detection unit for automatically 
detecting each scene change point. 

[0040] The metadata editing apparatus according to the second embodiment of the present invention will be de- 
25 scribed with reference to the accompanying drawings. FIG. 6 is a block diagram showing a construction of the metadata 
editing apparatus according to the second embodiment of the present invention. 

[0041 ] Referring to FIG. 6, a metadata editing apparatus 1 00A includes a content reproduction/display unit 2, a scene 
division unit 3, a thumbnail image generation unit 4, a scene description edit unit 5, a text information giving unit 6, a 
characteristic extraction unit 7, a user input unit 8, a metadata description unit 9, and scene change detection unit 39. 
30 Note that reference numeral 40 denotes scene start position information which is automatically detected. 

[0042] Next, how the metadata editing apparatus according to the second embodiment operates will be described 
with reference to the accompanying drawings. 

[0043] FIG. 7 illustrates how the metadata editing apparatus according to the second embodiment of the present 
invention operates. 

35 [0044] The construction elements other than the scene change detection unit 39 and the scene division unit 3 operate 
in the same manner as in the first embodiment described above. Therefore, operations unique to the second embod- 
iment will be described below. 

[0045] The scene change detection unit 39 automatically detects each scene change/cut point. This scene change 
detection is performed based on a difference in pixel between frames, a difference in color between the frames, a 
40 difference in luminance histogram between the frames, or the like, for instance. The scene division unit 3 determines 
a scene start position and a scene end position based on each scene change point detected by the scene change 
detection unit 39. 

[0046] Hereinafter, processing by the scene change detection unit 39 and the scene division unit 3 will be described 
in detail by taking, as an example, a case where a news video is content that is a target of editing, 
45 [0047] A case where a color histogram is used as characteristic amounts for the scene change detection will be 
described as an example. 

[0048] The scene change detection unit 39 calculates a color histogram for each frame. As a color system, HSV, 
RGB, YCbCr, and the like are available, although an HSV color space is used in this example. This HSV color space 
is composed of three elements called "hue (H) ", "saturation (S)", and "value (V) n . A histogram of each element is 
50 calculated. Next, from the obtained histogram, a difference in histogram between frames is calculated based on Equa- 
tion 1 given below, for instance. Here, it is assumed that frames from a scene start frame to the Nth frame (N=3, for 
instance) belong to the same scene, that is, do not contain any scene change point. Note that as the initial characteristic 
amounts of the scene, a means value (mean) and a standard deviation (sd) of the differences in histogram between 
the first N frames are obtained based on Equation 2 given below. 

55 
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bin H bifl S bin_V 

sum ( = £|H,(k)- H M (k)|+ £|s,(k)- S M (k)|+ £|v,(k)- V M (k)| Equation 1 

k«l tsl k«*t 

5 

sumf a sum of differences in histogram between a frame / and a frame A1 
Hj(h):a hue histogram, bin_H: the number of elements of the histogram 
Sj (h) : a saturation histogram, bin_S: the number of elements of the histogram 
10 Vj (h) : a value histogram, bin^V: the number of elements of the histogram 



j n-l I j N-1 

mean = — sum^sd = Jtj— :S ( sum i " mean) 2 Equation 2 

mean: a means value of the differences in histogram between the frames 
sd a standard deviation of the differences in histogram between the frames 

[0049] Then, each frame, from the N+1th and the subsequent frames, that has an inter-frame difference in histogram 
20 greater than "means+A,-sd" is regarded as a scene change point and is set as a new scene start position candidate. 
[0050] In this manner, multiple scene start position candidates are obtained. Next, there will be considered a case 
where like in the case of a news video, an image having a determined pattern is inserted at switching between news 
or the like. 

[0051] In many cases, in a news video, an image having a determined pattern, such as an image composed of an 
25 announcer, a studio set on the background, and a character description (caption) , is inserted at switching between 
news, for instance. Accordingly, the image having the predetermined pattern (hereinafter referred to as the "template 
image") or metadata describing the characteristic amounts of the temperate image is registered in advance. For in- 
stance, the characteristic amounts of the template image are the color histogram of the temperate image, the motion 
pattern (for instance, less motions are observed in the area of an announcer at switching between news) , or the like. 
30 [0052] When the temperate image is registered in advance, each image corresponding to a scene change point is 
matched against the template image, as shown in FIG. 7. Then, if the similarity therebetween is high, the scene change 
point is registered as a scene start position. The similarity may be judged based on inter-frame differences, inter-frame 
differences in color histogram, or the like. 

[0053] Also, when the characteristic amounts of the template image are registered in advance, characteristic amounts 
35 of each image corresponding to a scene change point are extracted and are matched against the characteristic amounts 
of the temperate image. If the similarity therebetween is high, the scene change point is registered as a scene start 
position. Then, information showing the scene start position is outputted to the scene division unit 3. 
[0054] The scene division unit 3 determines a scene start position and a scene end position based on the information 
showing the scene start position automatically detected by the scene change detection unit 39. Note that the scene 
40 division unit 3 of the second embodiment is also capable of determining the scene start position and the scene end 
position based on designation from the user, like in the first embodiment described above. 

[0055] It is also possible for the scene change detection unit 39 to detect each scene change point contained in a 
scene with reference to each scene start position and each scene end portion described in the scene section information 
metadata 12 outputted from the scene division unit 3 to the scene change detection unit 39. 

45 [0056] The scene description edit unit 5 re-divides or integrates the scenes automatically detected by the scene 
change detection unit 39 based on the scene section information metadata 12 from the scene division unit 3. Note that 
the details of the scene description edit unit 5 are the same as those in the first embodiment described above. 
[0057] As described above, with the metadata editing apparatus 1 00A according to the second embodiment, it be- 
comes possible to generate metadata describing the hierarchical structure possessed by content, such as video data, 

so and characteristic amounts of each scene, like in the first embodiment described above. In addition, the scene change 
detection unit 39 is provided, so that it becomes possible to automatically detect each scene change point in content. 

Third Embodiment 

55 [0058] In this third embodiment, a metadata reproduction apparatus will be described which performs summary re- 
production of images, searching, and the like using the metadata generated by the metadata editing apparatus ac- 
cording to the first embodiment or the second embodiment described above. 

[0059] The metadata reproduction apparatus according to the third embodiment of the present invention will be 
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described with reference to the accompanying drawings. FIG. 8 is a block diagram showing a construction of the 
metadata reproduction apparatus according to the third embodiment of the present invention. 
[0060] Referring to FIG. 8, a metadata reproduction apparatus 200 includes a metadata analysis unit 19, a structure 
display unit 20, a thumbnail image display unit 21 , an user input unit 22, a search unit 23, a search result display unit 

s 24, a summary creation unit 25, a summary structure display unit 26, and a content reproduction unit 27. 

[0061] The metadata analysis unit 19 performs analysis of metadata 28 describing the hierarchical scene structure 
possessed by content, information concerning the thumbnail of each scene, the characteristic amounts of each scene, 
and the like. The structure display unit 20 displays a scene structure 29 obtained as a result of the metadata analysis, 
that is, the hierarchical structure of the content. The thumbnail image display unit 21 displays thumbnail image infor- 

10 mation 30 obtained as a result of the metadata analysis. 

[0062] With the user input unit 22, a user inputs search designation, reproduction designation, and the like. The 
search unit 23 performs searching based on the search designation (search condition 31 ) from the user and the scene 
characteristic amounts or text information 32 obtained from the metadata. The search result display unit 24 displays 
a result 33 of the searching. The summary creation unit 25 performs creation of a summary based on summary creation 

15 designation (summary creation condition 34) from the user. The summary structure display unit 26 displays a structure 
38 of summarized content. The content reproduction unit 27 reproduces/displays the content based on summary in- 
formation 35, content reproduction designation 36, and content 37 to be reproduced. 

[0063] Next, how the metadata reproduction apparatus according to the third embodiment operates will be described 
with reference to the accompanying drawings. 
20 [0064] First, the metadata analysis unit 1 9 receives input of the metadata 28 describing the hierarchical scene struc- 
ture possessed by the content, information concerning the thumbnail of each scene, the characteristic amounts of each 
scene, and the like, and performs analysis of the metadata. 

[0065] In the third embodiment, it is assumed that the metadata 28 is metadata generated by the metadata description 
unit 9 of the first embodiment or the second embodiment described above in a format stipulated by MPEG-7. Conse- 
25 quently, the metadata is a text file written in XML or a binary file encoded in the binary format. 

[0066] If the metadata 28 is written in XML, the metadata analysis unit 19 serves as an XML parser that performs 
analysis of an XML file. On the other hand, if the metadata 28 is encoded in the binary format, the metadata analysis 
unit 1 9 serves as a decoder that performs decoding of the metadata 28. 

[0067] The structure display unit 20 receives input of a result of the analysis by the metadata analysis unit 1 9 and 
30 displays the hierarchical scene structure 29 of the content. The scene structure of the content is displayed in a tree 
form together with the title of each scene, as shown in FIG. 4. 

[0068] The thumbnail image display unit 21 receives input of the result of the analysis by the metadata analysis unit 

19 (thumbnail image information 30) anddisplays a list of thumbnail images of the content. 

[0069] The search unit 23 receives search designation from the user via the user input unit 22 and searches for a 

35 scene contained in the content. At this time, the user inputs a search condition by giving a keyword, a sample image, 
or the like via the user input unit 22. The search unit 23 searches for each scene matching the search condition 31 , 
such as the keyword or the characteristics of the sample image, given by the user based on the scene characteristic 
amounts described in the metadata or the text information 32 giving scene titles and the like. 
[0070] When the searching by the search unit 23 is finished, the search result display unit 24 receives input of the 

40 result 33 of the searching by the search unit 23 and performs displaying of the search result. As a method for displaying 
the search result, the thumbnail image of each scene matching the search condition is displayed, for instance. 
[0071] Also, the summary creation unit 25 creates a summary of the content based on summary creation designation 
from the user via the user input unit 22. At this time, the user inputs information showing the reproduction time of 
summarized content, user preferences, and the like using the user input unit 22. When the content is a news video, 

45 for instance, the user inputs preference information showing that, for instance, he/she wishes to mainly watch sports 
news in the news video or to watch a 20-minute summary of the news video whose original length is one hour. The 
summary creation unit 25 also creates the summary information 35 matching the summary condition based on the 
scene reproduction times described in the metadata and the text information 32 giving the scene titles and the like. 
For instance, this summary information 35 is a reproduction list of scenes contained in the summarized content and 

50 is a list in which the location information, such as the URL, of the content is written together with the start position and 
end position of each scene in the content that the user wishes to reproduce. 

[0072] Also, the content reproduction/display unit 27 specifies target content based on the location information of 
the content contained in the summary information 35, and performs obtainment/rep reduction/display of each scene 
tobe reproducedbased on the scene list contained in the summary information. In another form, the summary informa- 
55 tion hierarchically describes the scene structure of the summarized content. 

[0073] FIG. 9 shows an example of a hierarchical scene structure. FIG. 9 (a) shows an example of a scene structure 
of original content. Each scene is given importance in a range of 0.0 to 1 .0, with "1 .0" meaning the highest importance 
and "0.0" meaning the lowest importance. The importance is calculated based on the user preferences, for instance. 
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If the user preferences are registered in advance and indicate that he/she wishes to watch scenes of a soccer game 
of a team A and, in particular, to necessarily watch a result of the game and goal scenes, each scene is given importance 
reflecting the user preferences. 

[0074] Following this, when summarization Is performed using only scenes having the highest Importance in FIG. 9 

5 (a) , there is generated summarized content having the scene structure shown in FIG. 9 (b). Note that each scene has 
metadata showing the location information, such as the URL, of the content containing the scene, the position infor- 
mation (the start position and the end position) of the scene in the content, and the like. Information concerning the 
scene structure 38 of the summarized content is passed to the summary structure display unit 26, which then displays 
the scene structure 38 in the tree form shown in FIG. 9(b). 

w [0075] Also, when the user selects at least one scene that he/she wishes to reproduce using the scene structure 
displayed by the structure display unit 20 or the summary structure display unit 26 or using the scene thumbnails 
displayed by the thumbnail image display unit 21 or the search result display unit 24 via the user input unit 22, the 
content reproduction/display unit 27 reproduces/displays each selected scene contained in the content. 
[0076] As described above, with the metadata reproduction apparatus 200 according to the third embodiment, it 

15 becomes possible to reproduce only each scene that the user wishes to watch using the metadata generated by the 
metadata editing apparatus according to the first embodiment or the second embodiment described above or to search 
for the scene desired by the user using the characteristic amounts described in the metadata. 
[0077] In the third embodiment, the content reproduction/display unit 27 is provided within the metadata reproduction 
apparatus 200. However, this content reproduction/display unit may be provided in anther apparatus. For instance, 

20 manipulations and displaying concerning reproduction of the metadata, such as displaying of the scene structure and 
the thumbnail images, may be performed by a mobile telephone, a portable information terminal, or the like, and 
processing and displaying concerning reproduction of the multimedia content may be performed by a terminal (PC, for 
instance) connected to the mobile telephone, the portable information terminal, or the like via a network. 

25 Fourth Embodiment 

[0078] In this fourth embodiment, ametadata delivery server (metadata delivery apparatus), which delivers the meta- 
data of content to a client terminal, and a content delivery server, which scalably constructs the content with reference 
to the terminal capability of the client terminal and delivers the constructed content to the client terminal, will be de- 
30 scribed. 

[0079] A content delivery system according to the fourth embodiment of the present invention will be described with 
reference to the accompanying drawings. FIG. 10 is a block diagram showing a construction of the content delivery 
system according to the fourth embodiment of the present invention. 

[0080] Referring to FIG. 10, a content delivery server 300 includes a metadata delivery server 400, various client 

35 terminals 481 to 48n, and a content delivery server 50. 

[0081] The metadata delivery server 400 includes a metadata accumulation unit 41 , a metadata analysis unit 42, a 
terminal capability judgment unit 43, a metadata re-generation unit 44, and a metadata delivery unit 45. 
[0082] In the metadata accumulation unit 41 , there is accumulated the metadata generated by the metadata editing 
apparatus of the first embodiment or the second embodiment described above, for instance. The metadata analysis 

40 unit 42 performs analysis of metadata 49 describing the structure and characteristics of content. The terminal capability 
judgment unit 43 judges the terminal capability of each client terminal based on information 51 concerning the capability 
of the client terminal. The metadata re-generation unit 44 restructures the content in accordance with the judged ter- 
minal capability of the client terminal based on a result 50 of the analysis of the metadata, and re-generates metadata 
52 of the restructured content. The metadata delivery unit 45 delivers metadata 53 re-generated by the metadata re- 

45 generation unit 44 to the client terminals 481 to 48n. 

[0083] Note that the metadata accumulation unit 41 may be provided outside the metadata delivery server 400 of 
the fourth embodiment. In this case, the metadata delivery server 400 receives input of the metadata 49 from the 
metadata accumulation unit 41 via a network (not shown) or the like. 

[0084] On the other hand, the content delivery server 500 includes a content accumulation unit 46 and a content 
50 delivery unit 47. 

[0085] In the content accumulation unit 46, there is accumulated content 55. The content delivery unit 47 delivers 
content 56 to the client terminals 481 to 48n in accordance with content delivery requests 54 from the client terminals. 
[0086] Like in the case of the metadata delivery server 400 described above, the content accumulation unit 46 may 
be provided outside the content delivery server 500. In this case, the content delivery server 500 receives input of the 
55 content data 55 via a network (not shown). 

[0087] Next, how the content delivery system according to the fourth embodiment operates will be described with 
reference to the accompanying drawings. 

[0088] First, on the metadata delivery server 400 side, the metadata analysis unit 42 performs analysis of the meta- 
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data accumulated In the metadata accumulation unit 41 . The metadata analysis unit 42 operates in the same manner 
as the metadata analysis unit 19 of the metadata reproduction apparatus 200 of the third embodiment described above. 
By performing the analysis of the metadata, the metadata analysis unit 42 obtains information concerning the structure 
and characteristics of the content. 

5 [0089] FIG. 1 1 shows content structure information outputted from the metadata analysis unit of the metadata delivery 
server according to the fourth embodiment, with the illustrated example relating to a news video. In FIG. 11 , the hier- 
archical scene structure of the content is displayed in a tree form. Each node of the tree corresponds to one scene 
and is associated with various types of scene information. Here, the various types of scene information include a scene 
title, an abstract, time information giving a scene start position and a scene end position, a scene thumbnail, a repre- 

10 sentative frame, a thumbnail shot, a representative shot, and scene characteristics such as visual characteristic 
amounts concerning colors, motions, and the like. Note that in FIG. 11 , among the various types of scene information, 
only the scene titles are shown. 

[0090] Here, it is assumed that the client terminals are various information household devices having different ter- 
minal capabilities. The terminal capability refers to a communication speed, a processing speed, an image format that 

15 can be reproduced/displayed, an image resolution, a user input function, and the like. For instance, it is assumed that 
the client terminal 481 is a personal computer (PC) that has sufficient performance with respect to the communication 
speed, processing speed, display performance, and user input function. Also, it is assumed that the client terminal 482 
is a mobile telephone and the remaining client terminals are each a PDA or the like. Each of the client terminals 481 
to 48n sends information concerning its terminal performance. 

20 [0091] The terminal capability judgment unit 43 analyzes the information 51 that was sent from each of the client 
terminals 481 to 48n and shows the terminal performance of the client terminal, determines a deliverable image format, 
a maximum image resolution, a length of the content, and the like, and outputs them to the metadata re-generation 
unit 44. When the original content is video content encoded inMPEG-2 andhas a high resolution, for instance, the 
original content can be reproduced by the client terminal 481 as it is because the client terminal 481 has sufficient 

25 performance as described above. Also, it is assumed that this client terminal 481 has a function with which it is possible 
to perform the image summary reproduction and searching described in the third embodiment described above. On 
the other hand, it is assumed that the client terminal 482 is capable of reproducing only short video shots encoded in 
MPEG-4 and the maximum resolution displayable by the client terminal 482 is low. 

[0092] The metadata re-generation unit 44 restructures the content in accordance with the terminal performance of 

30 each of the client terminals 481 to48n informed by the terminal performance judgment unit 43, re-generates the meta- 
data 52 describing the structure and contents of the restructured content, and outputs the metadata 52 to themetadata 
delivery unit 45. For instance, theoriginalmetadata is delivered to the client terminal 481 as it is, so that the restructuring 
of the content is not performed. On the other hand, the client terminal 482 has only the function of reproducing short 
video shots and is incapable of reproducing every scene, so that the restructuring of the content is performed for the 

35 client terminal 482 using short video shots of important scenes. 

[0093] FIG. 12 shows an example of a content structure after the restructuring by the metadata re-generation unit 
of the content delivery system according to the fourth embodiment. As shown in FIG. 12, each important scene, out 
of scenes of the news video, is extracted and the content is restructured so as to include only the representative shot 
or representative frame of each extracted scene. Also, the client terminal 482 does not have the search function de- 

40 scribed in the above third embodiment, so that among the various types of scene information in the metadata, the 
scene characteristic amounts are not required to be included for searching. Therefore, the metadata re-generation unit 
44 re-generates metadata describing only the structure of restructured scenes and the position information of the 
representative shots or representative frames of the scenes, and sends the metadata to the metadata delivery unit 45. 
[0094] Themetadata delivery unit 45 delivers themetadata 53 generated by the metadata re-generation unit 44 to 

<5 the client terminals 481 to 48n. 

[0095] Each of the client terminals 481 to 48n analyzes the metadata 53 delivered by the metadata delivery unit 45 
and obtains scene structure information of the content. When a user of each of the client terminals 481 to 48n selects 
a scene that he/she wishes to reproduce, the client terminal transmits position information of the selected scene to the 
content delivery unit 47 of the content delivery server 500. 

50 [0096] On receiving the scene position information from each of the client terminals 481 to 48n, the content delivery 
unit 47 of the content delivery server 500 obtains corresponding content 55 from the content accumulation unit 46 and 
delivers the content to each of the client terminals 481 to 48n . In the case of the client terminal 481 , the content delivery 
unit 47 sends a scene start position and a scene end position and delivers a corresponding scene of the original content. 
On the other hand, in the case of the client terminal 482, the content delivery unit 47 sends the location information 

55 (such as the URL) of a scene representative shot. Note that when the representative shot is not reproducible/displayable 
by the client terminal 482 because of its image format, image resolution, image file size, or the like, the content delivery 
unit 47 performs various kinds of processing, such as format conversion, resolution conversion, and reduction in file 
size through content summarization, and sends resultant data to the client terminal 482. 
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[0097] As described above, with the metadata delivery server 400 of the fourth embodiment, it becomes possible to 
re-generate metadata in accordance with the capability of each of the client terminals 481 to 48n and to deliver the re- 
generated metadata to each of the client terminals 481 to 48n. 

[0098] It should be noted here that in FIG. 1 0, the metadata delivery server 400 and the content delivery server 500 
5 are shown as separated apparatuses, but the present invention is not limited to this. For instance, the content delivery 
server may be provided in the metadata delivery server or the metadata delivery server may be provided in the content 
delivery server. In addition, needless to say, the metadata delivery server and the content delivery server may be 
provided in the same server. In this case, it becomes possible for the terminal capability judgment unit 43 to easily 
inform the content delivery unit 47 of the capability of each of the client terminals 481 to 48n, which makes it possible 
10 to restructure the content through format conversion or the like in accordance with the capability of the client terminal 
and to deliver the restructured content to each of the client terminals 481 to 48n. 

[0099] Also, the fourth embodiment has been described by assuming that in the metadata accumulation unit 41 , the 
metadata generated by the metadata editing apparatus of the first embodiment or the second embodiment described 
above is accumulated. However, the present inventionis not limited to this and, needless to say, metadata generated 
15 by an apparatus other than the metadata editing apparatus of the first embodiment orthe second embodiment described 
above may be accumulated in the metadata accumulation unit 41 . 

Fifth Embodiment 

20 [01 00] In this fifth embodiment, another example of the metadata delivery server described in the above fourth em- 
bodiment will be described. The metadata delivery server of the above fourth embodiment performs the metadata re- 
generation based on the terminal information sent from each client terminal. In the fifth embodiment, however, in order 
to more appropriately perform the metadata re-generation, the metadata delivery server (metadata delivery apparatus) 
is provided with a metadata analysis/re-generation unit that performs themetadata re-generation using metadata op- 

25 timization hint information that is hint information for the metadata re-generation. 

[0101] The metadata delivery server according to the fifth embodiment of the present invention will be described 
with reference to the accompanying drawings. FIG. 13 is a block diagram showing a construction of the metadata 
delivery server according to the fifth embodiment of the present invention. 

[0102] Referring to FIG. 1 3, a metadata delivery server 400A includes a hint information analysis unit 61 , metadata 

30 analysis/re-generation unit 63, and a metadata delivery unit 45. 

[0103] The hint information analysis unit 61 analyzes metadata optimization hint information 60 andoutputs a result 
of the analysis. The metadata analysis/re-generation unit 63 analyzes metadata 49 describing the structure and char- 
acteristics of content based on analyzed metadata optimization hint information 62 and a condition 65 concerning 
metadata re-generation such as information concerning the performances of the client terminals or user preferences, 

35 and outputs restructured metadata 64. Then, the metadata delivery unit 45 delivers metadata 53 to the client terminals. 
[0104] In the metadata accumulation unit 41 (see FIG. 1 0), the metadata 49 describing the structure and character- 
istics of the content and the metadata optimization hint information 60 that is hint information for the re-generation of 
the metadata 49 are accumulated. Here, the metadata optimization hint information 60 for the re-generation of the 
metadata 49 is information describing the types of information contained in the metadata 49, the amount of the con- 

40 tained information, and the outline and complexity of the metadata 49. 

[0105] Next, how the metadata delivery server according to the fifth embodiment operates will be described with 
reference to the accompanying drawings. 

[0106] Themetadata optlmi zation hint information 60 will be described in detail by taking, as an example, a case of 
video content having the structure shown in FIG. 14. 
45 [0107] A video content (Root) (Soccer game program) is broadly divided into two scenes (Scene 1 and Scene2) 
corresponding to the first half and the second half, and the first half scene is further divided into multiple scenes (Scenel- 
1 , Scenel-2, Scenel -n) (such as goal scenes and corner kick scenes). In FIG. 1 4, the temporal hierarchical structure 
among the scenes is indicated using a tree structure. 

[0108] The metadata 49 corresponding to the video content describes the temporal hierarchical structure of the 
50 content, that is, the temporal relations among the scenes, and the start times and lengths of the scenes. The metadata 
49 also describes text information (such as a title, abstract, category, and explanatory notes) , importance, and the like 
of each scene as well as the characteristics (for instance, a color histogram or motion complexity) possessed by the 
scene in accordance with the hierarchical level of the scene. Note that in this fifth embodiment, it is assumed that 
MPEG-7 standardized by ISO is used as a metadata description format. 
55 [0109] FIG. 15 shows how the metadata is described in MPEG-7. In MPEG-7, each scene is described in units called 
"video segment". In each video segment, there are described time information (scene start point and length), a title, 
an outline, a category, and the like. Note that there is a case where the information described in each video segment 
is changed in accordance with the hierarchical level of the video segment. In the example shown in FIG. 15, importance 
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is described in each video segment at Level 2 and Level 3, although no importance is described in each video segment 
at Level 4, Also, the characteristic amounts concerning colors and motions are described only in each video segment 
at Level 4. 

[0110] It is possible to express the temporal hierarchical relations among the scenes by recursively describing the 

5 video segments. In the description example shown in FIG. 15, with a "time division" description, there is described a 
state where one video segment iscomposed ofmultiple video segments temporally divided. InM PEG-7, it is also possible 
to describe the spatial hierarchical structure possessed by the content in a like manner. In this case, instead of the 
"time division" description, a "space division" description is used to express a state where one segment is composed 
of multiple segments spatially divided. 

10 [0111] The metadata optimization hint information 60 for the re-generation of the metadata 49 describes the types 
and contents of information (descriptors) contained in the metadata 49. Accordingly, in the case of the metadata shown 
in FIG. 1 5, the metadata optimization hint information 60 contains a descriptor ("time division") expressing the temporal 
hieratical structure possessed by the content, descriptors expressing the color histogram and the motion complexity, 
and descriptors expressing the title, abstract, category, and importance. Also, in order to express description contents 

15 and complexity, the depth of each video segment in the hierarchical structure is expressed with up to four levels (Level 
1 to Level 4) . Further, the importance assumes one of five discrete values ({0.0, 0.25, 0.5, 0.75, 1 .0}). As importance 
with respect to viewpoints, there are described importance from the viewpoint of "Team A" and importance from the 
viewpoint of 'Team B". Also, there is described the hierarchical position at which the importance is described (video 
segment level at which the importance is described). 

20 [0112] FIG. 16 shows an example of a format of the metadata optimization hint information 60. The metadata opti- 
mization hint information 60 shown in FIG. 16 contains metadata file information and metadata construction element 
information. 

[0113] The metadata file information describes information for predicting resources required to process the metadata, 
such as the memory size required to accumulate/analyze the metadata and the processing system (S/W) required to 

25 analyze the metadata. Inmore detail, for instance, the metadata file information describes the location of a metadata 
file, the size of the metadata file, the format of the metadata file (for instance, the XML format or the binary format), 
syntax file information (location of a syntax file defining the syntax of the metadata) , and an appearing element number 
showing the number of elements contained (appearing) in the metadata. Note that when the metadata is described in 
the XML format, the syntax file defining the format of the metadata file corresponds to a DTD file, a schema file, or the 

30 like defining the description format (syntax) of the metadata, and the syntax file information describes the location of 
the DTD file or the schema file, for instance. 

[0114] The metadata construction element information is information describing the type and contents of each de- 
scriptor constituting the metadata. In more detail, the metadata construction element information contains the name 
of each descriptor contained in the metadata, the appearing frequency (number of appearing times) of the descriptor 

35 in the metadata, and a description (completeness of description) showing whether or not the descriptor contains every 
descriptor that has the possibility of being syntaxically contained. In addition, when the descriptor is recursively de- 
scribed, the metadata construction element information also contains the temporal or spatial hierarchical property 
(maximum value of the depth) possessed by the descriptor. In the case of the metadata description shown in FIG. 15, 
for instance, "video segment" is the descriptor recursively described and has a hierarchical structure with up to four 

40 levels, so that the maximum hierarchical depth possessed by the "video segment" descriptor becomes four. 

[0115] In addition, as to a descriptor contained the descriptor recursively described, the hint information also de- 
scribes the appearing position (hierarchical level) at which the contained descriptor appears. For instance, "importance" 
is a descriptor contained in the "video segment" descriptor and, when the "importance" is contained in the video segment 
at up to Level 3, that is, is not contained in the video segment at Level 4, the appearing position of the "importance" 

45 becomes up to Level 3. In this manner, the appearing position is specified using the hierarchical level. However, when 
an ID is assigned to each "video segment" containing the "importance" or the "video segment" itself, it is also possible 
to describe the appearing position as an ID list. Also, in the case of a descriptor having a value, the hint information 
additionally describes the type of the descriptor and the range of values that the descriptor is assumable to have. When 
the importance is expressed using the five discrete values ({0.0, 0.25, 0.5, 0.75, 1.0}) with respect to each of the 

50 viewpoints of "Team A" and "Team B", for instance, the assumable values of the "importance" become a list of {0.0, 
0.25, 0.5, 0.75, 1 .0} having a floating-point form. The above description is repeated for each descriptor that is a con- 
struction element of the metadata. 

[0116] FIG. 17 shows an example of the metadata optimization hint information described in accordance with the 
format shown in FIG. 16. It can be seen that the example of the metadata optimization hint information 60 shown in 
55 FIG. 1 7 contains the metadata file information and the metadata construction element information for descriptors such 
as the "video segment" descriptor and the "title" descriptor. 

[0117] Next, a method for performing re-generation of metadata using the metadata optimization hint information 60 
will be described with reference to FIG. 13. 
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[0118] The hint information analysis unit 61 performs analysis of the metadata optimization hint information 60 de- 
scribed in the specified format. The metadata analysis/re-generation unit 63 performs analysis of the metadata 49 
using the analyzed metadata optimization hint information 62 outputted from the hint information analysis unit 61 , and 
outputs the metadata 64 re-generated based on the condition 65 concerning the metadata re-generation. 
s [01 19] FIG. 1 8 shows an example of a method with which the metadata analysis/re-generation unit 63 analyzes the 
metadata using the analyzedmetadata optimization hint information 62. In this example, it is assumed that only video 
segment characterized by having importance of 0.5 or higher is extracted from the original metadata 49, and metadata 
composed of only a description concerning the extracted video segment is re-generated. 

[0120] First, the metadata analysis/re-generation unit 63 specifies metadata necessary for re-generation based on 
10 the condition 65 for metadata re-generation (step S1). In this example, only video segment characterized by having 
importance of 0.5 or higher is extracted, so that "importance" and "video segment" are descriptors necessary for the 
re-generation. 

[01 21 ] Next, the analyzed metadata optimization hint information 62 judges whether or not the descriptors specified 
in step S1 are contained in the metadata 49 (step S2) (the following description will be made by taking a case of the 
15 "importance" descriptor as an example). 

[0122] When the "importance" descriptor is contained in the metadata, analysis of the metadata is performed (step 
3) . On the other hand, when the "importance" descriptor is not contained, the metadata analysis processing is ended 
(step S4). 

[01 23] Also, when the analyzedmetadata optimization hint information 62 specifies that the appearing position of the 
20 "importance" descriptor is up to Level 3 of the hierarchical structure, at the time when the analysis of the video segments 
up to Level 3 is finished (step S5) , the analysis processing is ended without performing the analysis for Level 4 and 
the following hierarchical levels (step S6) . 

[0124] It should be noted here that in order to perform the analysis of another piece of metadata 49 if necessary, the 
operations in step S1 and the following steps are repeated. Also, when the metadata optimization hint information 62 
25 specifies that the number of appearing times of the "importance" descriptor is 20, at the time when the analysis of 20 
"importance" descriptors is finished (step S5), the analysis of the metadata is ended (step S6). After the metadata 
analysis processing is ended in step S4 or step S6, in order to perform the analysis of another piece of metadata if 
necessary, the operations in step S1 and the following steps are repeated. 

[0125] FIG. 1 9 shows another example of the method for analyzing the metadata using the analyzed metadata op- 
30 timization hint information 62. In this example, it is assumed that metadata is re-generated by extracting only video 
segment containing the "title" descriptor. The judgement whether or not the metadata contains the "title" descriptor is 
performed in the same manner as in the example shown in FIG. 18. 

[0126] When themetadata contains the "title" descriptor, themetadata analysis/re-generation unit 63 judges whether 
or not a video segment matches an appearing position ID described in the metadata optimization hint information 62 
35 (stepS13). 

[01 27] When the video segment does not match the ID, the video segment does not contain the "title" descriptor, so 
that the analysis of the description of this video segment is skipped (step S16). 

[0128] On the other hand, when the video segment matches the ID, in order to obtain the "title" descriptor, the analysis 
of the description of this video segment is performed (S15), 
40 [0129] Next, when the analysis of every video segment matching the appearing position ID is finished (step S17), 
no video segment containing the "title" descriptor remains in the metadata, so that the analysis processing is ended 
(stepS 18). 

[0130] It should be noted here that in order to perform the analysis of another piece of metadata if necessary, the 
operations in step S11 and the following steps are repeated. Then, the metadata 64 restructured using the descriptor 
45 extracted through the analysis processing described above is outputted. 

[0131] The metadata delivery unit 45 delivers the restructured metadata 64 to the client terminals. 
[0132] It should be noted here that although not shown, after the metadata re-generation, the location of the metadata 
file, the size of the metadata file, the number of elements appearing in the metadata, and the information concerning 
the metadata construction elements are also changed. Accordingly, metadata optimization hint information correspond- 
so ing to the metadata after the re-generation may be re-generated. 

[0133] It has conventionallybeen required to analyze every descriptor contained in metadata for metadata re-gen- 
eration. In the fifth embodiment, however, the descriptor analysis of the metadata 49 is performed in the manner de- 
scribed above using the metadata optimization hint information 60 describing the list of each descriptor contained in 
the metadata 49, the appearing position of the descriptor, the number of appearing times, and the like. As a result, it 
55 becomes possible to omit the analysis of the metadata 49 itself for the metadata re-generation. Also, the analysis of 
each descriptor not matching the re-generation condition is omitted using the appearing position or the number of 
appearing times, so that it becomes possible to reduce the processing cost (such as the processing amount and the 
memory usage amount) required to perform the metadata analysis and re-generation. 
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Sixth Embodiment 

[01 34] In the fifth embodiment described above, there has been described the metadata delivery server that reduces 
the processing cost required to perform the metadata analysis and re-generation using the metadata optimization hint 
5 information for the metadata re-generation. In this sixth embodiment, however, a metadata search server (metadata 
search apparatus) will be described which reduces the processing cost required to perform metadata searching using 
the metadata optimization hint information. 

[0135] The metadata search server according to the sixth embodiment of the present invention will be described with 
reference to the accompanying drawings. FIG. 20 is a block diagram showing a construction of the metadata search 
10 server according to the sixth embodiment of the present invention. 

[0136] Referring to FIG. 20, a metadata delivery server 600 includes a hint information analysis unit 61 , metadata 
analysis unit 71 , and a search unit 73. 

[0137] The hint information analysis unit 61 is the same as that in the fifth embodiment described above and therefore 

the description thereof is omitted in this embodiment. The metadata analysis unit 71 performs analysis of an enormous 
15 amount of metadata 49 describing the structure and characteristics of content with efficiency and with a less processing 

cost using analyzed metadata optimization hint information 62 and a search condition 70. The search unit 73 searches 

for content matching the search condition using a result 72 of the analysis of the metadata. 

[0138] Next, how the metadata search server according to the sixth embodiment operates will be described with 

reference to the accompanying drawings. 
20 [0139] FIG. 21 is a flowchart showing how the metadata analysis unit of the metadata search server according to 

the sixth embodiment operates. 

[0140] The metadata analysis unit 71 performs analysis of at least one piece of metadata using the metadata opti- 
mization hint information 62 corresponding to themetadata. Here, in this example, the analysis of the metadata is 
extraction of each characteristic description necessary for the searching from the metadata. When a video segment 

25 having specific color characteristic amounts is given as the search condition and each video segment having charac- 
teristics close to those of the given video segment will be searched for, for instance, it is required to extract each video 
segment having a color characteristic description. In the metadata example shown in FIG. 15, a color characteristic 
description ("color histogram") is added to each video segment at Level 4, so that a description concerning each video 
segment at Level 4 is extracted. 

30 [0141] Themetadata analysis unit 71 analyses the search condition 70 and specifies a descriptor that is valid for the 
searching (step S21 ). Here, the search condition may be characteristic amounts described in a format defined in MPEG- 
7. Alternatively, the search condition may be an image, a keyword, or the like. When the characteristic amounts (color 
arrangement information, for instance) described in the format defined in the MPEG-7 is given as the search condition, 
each corresponding descriptor (color arrangement information) becomes a descriptor that is valid for the searching. 

35 Also, when a keyword is given as the search condition, each descriptor in a text form (such as a title, an abstract, or 
explanatory notes) becomes the descriptor that is valid for the searching. 

[0142] Next, by referring to the metadata optimization hint information 62, it is judged whether the selected descriptor 
is contained in the metadata 49 (step S22). When the descriptor for the searching is not contained in the metadata 49, 
the analysis processing of the metadata 49 is ended (step S24) and the analysis of another piece of metadata 49 is 
40 performed if necessary. 

[0143] On the other hand, when the selected descriptor is contained in the metadata 49, the analysis of the metadata 
is performed (step S23). As to a metadata analysis method used in this embodiment, like in the case of the fifth em- 
bodiment described above, the metadata analysis processing shown in FIG. 1 8 or 1 9 is performed with efficiency using 
the metadata optimization hint information 62 (steps S25 and S26) . As a result of the operations described above, the 

45 metadata analysis unit 71 extracts each characteristic description necessary for the searching. 

[0144] The search unit 73 searches for content matching the search condition using the metadata analysis result 
(characteristic description necessary for the searching) 72 outputted from the metadata analysis unit 71 . In this exam- 
ple, a description concerning each video segment having a color characteristic description ("color histogram") is out- 
putted by the metadata analysis unit 71 , so that the search unit 73 judges compatibility with the color characteristic 

50 amounts (histogram) given as the search condition and outputs information ("time information", for instance) concerning 
each video segment, whose judgment result is positive, as a search result 74. 

[01 45] As described above, in the sixth embodiment, the analysis of the metadata 49 is performed using the metadata 
optimization hint information 60, so that it becomes possible to omit the analysis of the metadata 49 itself for the 
metadata re-generation. Also, the analysis of each descriptor that is not necessary for the searching is omitted based 
55 on the appearing position and the number of appearing times, so that it becomes possible to reduce the processing 
cost (such as the processing amount and the memory usage amount) required to perform the metadata searching. 
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Seventh Embodiment 

[0146] In the fifth embodiment and the sixth embodiment described above, description has been made for a server 
side that uses the metadata optimization hint information. In this seventh embodiment, however, a client terminal (meta- 
5 data re-generation condition setting apparatus) will be described which uses the metadata optimization hint information . 
[0147] The client terminal according to the seventh embodiment of the present invention will be described with ref- 
erence to the accompanying drawings. FIG. 22 is a block diagram showing a construction of the client terminal according 
to the seventh embodiment of the present invention. 

[0148] Referring to FIG. 22, a client terminal 48A includes a hint information analysis unit 80, and a metadata re- 
10 generation condition setting unit 82. 

[0149] It should be noted here that FIG. 22 shows only a portion of the function of the client terminal 48A that relates 
to means for setting a condition for metadata re-generation using metadata optimization hint information 60. 
[01 50] Next, how the client terminal according to the seventh embodiment operates will be described with reference 
to the accompanying drawing. 

15 [0151] The hint information analysis unit 80 performs analysis of the metadata optimization hint information 60 de- 
scribed in a specified format. This hint information analysis unit 80 is the same as that in the fifth embodiment described 
above, so that the detailed description thereof is omitted in this embodiment. 

[0152] Then, the metadata re-generation condition setting unit 82 performs setting of a condition 83 for metadata 
re-generation based on a result 81 of the analysis outputted from the hint information analysis unit 80. Here, the con- 
20 dition setting refers to selection of each descriptor that is unnecessary for the client terminal 48A from among various 
descriptors contained in the metadata optimization hint information 60, for instance. When the client terminal 48A is 
not provided with a search function using characteristic amounts, each descriptor expressing the characteristic 
amounts, such as a color histogram or motion complexity, is unnecessary. 

[01 53] As another example of the condition setting, when the complexity of the metadata is increased in accordance 
25 with an increase in depth in the hierarchical structure describing relations between scenes of content, the depth in the 
hierarchical structure processible by the client terminal is set based on the maximum value of the depth in the hierar- 
chical structure described in the metadata optimization hint information 60. In still another example, a viewpoint of a 
user and a threshold value of scene importance are set based on assumable values of the importance described in 
the metadata optimization hint information 60. 
30 [0154] When the importance assumes five discrete values ({0.0, 0.25, 0.5, 0.75, 1 .0}) from each of the viewpoints 
of 'Team A" and "Team B" as described above, the condition is set so that only each scene having the importance of 
0.5 or higher from the viewpoint of "Team A" is selected, for instance. 

[0155] The condition 83 for metadata re-generation set by the metadata re-generation condition setting unit 82 is 
sent to the metadata delivery server. On the metadata delivery server side, the metadata is restructured based on the 

35 metadata re-generation condition and the terminal performance of the client terminal. When the maximum value of the 
depth in the hierarchical structure of the original metadata is four and the depth in the hierarchical structure processible 
by the client terminal is set at two in the metadata re-generation condition, for instance, the structure of the metadata 
is restructured so that the maximum value of the depth in the hierarchical structure becomes two. 
[01 56] Also, when the metadata re-generation condition has been set so that only each scene having the importance 

40 of 0.5 or higher from the viewpoint of "Team A" is selected, metadata composed of only each scene matching the 
condition is re-generated. As a result, like in the fifth embodiment described above, it becomes possible to perform the 
metadata re-generation with efficiency using the metadata optimization hint information. 

[0157] As described above, in the seventh embodiment, the metadata re-generation condition is set using the meta- 
data optimization hint information 60, so that it becomes possible to generate appropriate metadata in accordance with 
45 the client terminal or application. 

Eighth Embodiment 

[0158] In the fifth embodiment and the sixth embodiment described above, there has been described a server that 
50 re-generates metadata using the metadata optimization hint information and delivers the re-generated metadata. In 
this eighth embodiment, however, a content delivery server (content delivery apparatus) will be described which ana- 
lyzes metadata using the metadata optimization hint information, re-generates content suited for the client terminal or 
user preferences using a result of the analysis, and delivers the re-generated content. 

[0159] The content delivery server according to the eighth embodiment of the present invention will be described 
55 with reference to the accompanying drawings. FIG. 23 is a block diagram showing a construction of the content delivery 
server according to the eighth embodiment of the present invention. 

[0160] Referring to FIG. 23, a content delivery server 500A includes a hint information analysis unit 61 , a metadata 
analysis unit and a content restructuring/delivery unit 88. 
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[0161] Next, how the content delivery server according to the eighth embodiment operates will be described with 
reference to the accompanying drawings. 

[0162] The hint information analysis unit 61 operates in the same manner as in the fifth embodiment described above, 
so that the description thereof is omitted in this embodiment. 

5 [0163] The metadata analysis unit 86 performs analysis of metadata 49 using analyzed metadata optimization hint 
information 62 outputted from the hint information analysis unit 61 , and extracts each descriptionmatching information 
concerning the client terminal or a condition 85 concerning content restructuring such as user preferences. The analysis 
using the hint information is the same asthatinthefifthembodimentdescribedabove. However, thiseighth embodiment 
differs from the fifth embodiment in that not the metadata re-generation but content restructuring is performed using 

10 each extracted description. Each description extracted by the metadata analysis unit 86, that is, analyzed metadata 
87 is outputted to the content restructuring/delivery unit 88. 

[0164] The content restructuring/delivery unit 88 performs restructuring of content 89 based on each description 
extracted by the metadata analysis unit 86. Here, the following description will be made based on the example described 
in the above fifth embodiment. In the fifth embodiment, only each video segment having the characteristics, whose 
15 importance is 0.5 or higher, is extracted from the metadata 49 and metadata composed of only each description con- 
cerning the extracted video segment is re-generated. 

[0165] In a like manner, in this eighth embodiment, only each video segment having characteristics, whose impor- 
tance is 0.5 or higher, is extracted from the metadata 49 and content 90 composed of only each scene corresponding 
to the extracted video segment is restructured and is delivered. In the description concerning the extracted video 
20 segment, the location of corresponding content and the position (time information) of the video segment in the content 
are described. Therefore, it is possibleto clip each corresponding scene from the content, to restructure a single content 
90 using the clipped scene, and to deliver the restructured content 90. Alternatively, it is possible to clip each corre- 
sponding scene from the content and to sequentially deliver the clipped scene. 

[01 66] As described above, with the content delivery server 500A according to the eighth embodiment, the metadata 
25 analysis is performed using the metadata optimization hint information 60 describing a list of each descriptor contained 
in the metadata 49, the appearing position of the descriptor, the number of appearing times, and the like, so that it 
becomes possible to omit the analysis of the metadata 49 itself for the metadata re-generation. Also, the analysis of 
each descriptor not matching the re-generation condition is omitted using the appearing position and the number of 
appearing times, so that it becomes possible to reduce the processing cost (such as the processing amount and the 
30 memory usage amount) required to perform the metadata analysis and the content restructuring at the time of re- 
generation and delivery of content suited for the client terminal and user preferences. 

INDUSTRIAL APPLICABILITY 

35 [0167] As described above, with the present invention, multimedia content containing moving pictures and audio is 
divided into multiple scenes, editing of the multiple scenes is performed, and metadata that is scene structure infor- 
mation describing the hierarchical structure of the multimedia content is generated. As a result, it becomes possible 
to generate metadata describing the hierarchical structure possessed by multimedia content containing video data and 
the like. 

40 

Claims 

1 . A metadata editing apparatus comprising: 

45 

a scene division unit for dividing multimedia content containing at least one of moving pictures and audio into 
a plurality of scenes to generate scene section information metadata indicating a scene start position and a 
scene endposition for each scene obtained as a result of the division; 

a scene description edit unit for performing hierarchical editing of each scene of the multimedia content based 
50 on the scene section information metadata sent from the scene division unit and generating scene structure 

information metadata describing a hierarchical structure of the multimedia content; and 
a metadata description unit for integrating the scene section information metadata and the scene structure 
information metadata and generating metadata describing contents and a structure of the multimedia content 
in accordance with a predetermined format. 

55 

2. Ametadata editing apparatus according to claim 1 , further comprising: 

a characteristic extraction unit for extracting visual characteristic amounts of each scene of the multimedia 
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content based on the scene section informationmetadata sent from the scene division unit and generating 
characteristic description metadata, 

wherein the metadata description unit integrates the scene section information metadata, the scene structure 
information metadata, and the characteristic description metadata and generates the metadata describing the 
contents and the structure of the multimedia content in accordance with the predetermined format. 

A metadata editing apparatus according to claim 1 or 2, further comprising: 

a scene change detection unit for detecting each scene change point of the multimedia content based on an 
inter-frame difference, 

wherein the scene division unit divides the multimedia content into the plurality of scenes based on the scene 
change point. 

A metadata reproduction apparatus comprising: 

a metadata analysis unit for analyzing metadata describing contents and a structure of multimedia content 
containing at least one of moving pictures and audio; 

a search unit for searching the metadata for each scene matching a predetermined search condition based 
on visual characteristic amounts of each scene described in the metadata analyzed by the metadata analysis 
unit; and 

a summary creation unit for creating summary information of the multimedia content matching a predetermined 
summary creation condition based on predetermined information described in the metadata analyzed by the 
metadata analysis unit. 

A metadata delivery apparatus comprising: 

a metadata analysis unit for analyzing metadata describing contents and a structure of multimedia content 
containing at least one of moving pictures and audio; 

a terminal capability judgment unit forjudging a capability of a client terminal based on information concerning 
performance of the client terminal; 

a metadata re-generation unit for restructuring the multimedia content in accordance with the judged capability 
of the client terminal based on a result of the metadata analysis by the metadata analysis unit and re-generating 
second metadata describing contents of the restructured multimedia content; and 

a metadata delivery unit for delivering the second metadata re-generated by the metadata re-generation unit 
to the client terminal. 

A metadata delivery apparatus comprising: 

a hint information analysis unit for analyzing metadata optimization hint information describing a type of each 
descriptor contained in metadata; 

a metadata analysis/re-generation unit for analyzing metadata describing contents and a structure of multi- 
media content containing at least one of moving pictures and audio based on the analyzed metadata optimi- 
zation hint information and a condition for metadata re-generation and re-generating second metadata; and 
a metadata delivery unit for delivering the second metadata re-generated by the metadata analysis/re-gener- 
ation unit to a client terminal. 

A metadata search apparatus comprising: 

a hint information analysis unit for analyzing metadata optimization hint information describing a type and 
contents of each descriptor contained in metadata; 

a metadata analysis unit for analyzing the metadata describing contents and a structure of multimedia content 
containing at least one of moving pictures and audio based on the analyzed metadata optimization hint infor- 
mation and a search condition; and 

a search unit for searching content matching the search condition using a result of the analysis of the metadata . 
A metadata re-generation condition setting apparatus comprising: 
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a hint information analysis unit for analyzing metadata optimization hint information describing a type and 
contents of each descriptor contained in metadata; and 

a metadata re-generation condition setting unit for setting a condition for re-generation of metadata describing 
contents and a structure of multimedia content containing at least one of moving pictures and audio based on 
s the analyzed metadata optimization hint information. 

9. A content delivery apparatus comprising: 

a hint information analysis unit for analyzing metadata optimization hint information describing a type and 
10 contents of each descriptor contained in metadata; 

a metadata analysis unit for extracting each description matching a condition for content restructuring from 
the metadata describing contents and a structure of multimedia content containing at least one of moving 
pictures and audio based on the analyzed metadata optimization hint information and the condition for the 
content restructuring; and 

15 a content restructuring/delivery unit for restructuring the content based on the extracted description and de- 

livering the restructured content to a client terminal. 

10. A metadata delivery method comprising the steps of: 

20 analyzing metadata optimization hint information describing a type of each descriptor contained in metadata; 

re-generating second metadata by analyzing the metadata describing contents and a structure of multimedia 
content containing at least one of moving pictures and audio based on the analyzed metadata optimization 
hint information and a condition for re-generation of the metadata; and 
delivering the re-generated second metadata to a client terminal. 

25 

11. A metadata delivery method according to claim 10, 

wherein the metadata optimization hint information describes a location of a metadata file and an appearing 
element number showing a number of elements contained in the metadata as metadata file information. 

30 1 2. A metadata delivery method according to claim 1 0, 

wherein the metadata optimization hint information describes a size of a metadata file, a format of the meta- 
data file, and syntax file information as metadata file information. 

13. A metadata delivery method according to claim 10, 

35 wherein the metadata optimization hint information describes a name of the descriptor contained in the meta- 

data as one piece of metadata construction element information. 

14. A metadata delivery method according to claim 10, 

wherein the metadata optimization hint information describes a frequency, at which the descriptor contained 
40 in the metadata appears, as one piece of metadata construction element information. 

15. A metadata delivery method according to claim 10, 

wherein the metadata optimization hint information describes a completeness of description of the descriptor 
contained in the metadata as one piece of metadata construction element information. 

45 

16. A metadata delivery method according to claim 10, 

wherein the metadata optimization hint information describes a temporal hierarchical property possessed by 
the descriptor contained in the metadata as one piece of metadata construction element information. 

so 17. A metadata delivery method according to claim 10, 

wherein the metadata optimization hint information describes a spatial hierarchical property possessed by 
the descriptor contained in the metadata as one piece of metadata construction element information. 

18. A metadata delivery method according to claim 10, 

55 wherein the metadata optimization hint information describes an appearing position at which the descriptor 

contained in the metadata appears, as one piece of metadata construction element information. 

19. A metadata delivery method according to claim 10, 
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wherein the metadata optimization hint information describes a type of the descriptor contained in the meta- 
data as one piece of metadata construction element information. 

20. A metadata delivery method according to claim 10, 

wherein the metadata optimization hint information describes an assumable value range of the descriptor 
contained in the metadata as one piece of metadata construction element information. 
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